AITopics | thompson sampling approach

Collaborating Authors

thompson sampling approach

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Learning Unknown Markov Decision Processes: A Thompson Sampling Approach

Neural Information Processing SystemsNov-21-2025, 15:13:54 GMT

We consider the problem of learning an unknown Markov Decision Process (MDP) that is weakly communicating in the infinite horizon setting. We propose a Thompson Sampling-based reinforcement learning algorithm with dynamic episodes (TSDE). At the beginning of each episode, the algorithm generates a sample from the posterior distribution over the unknown model parameters. It then follows the optimal stationary policy for the sampled model for the rest of the episode. The duration of each episode is dynamically determined by two stopping criteria.

artificial intelligence, machine learning, reinforcement learning, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.60)

Add feedback

Reviews: Learning Unknown Markov Decision Processes: A Thompson Sampling Approach

Neural Information Processing SystemsOct-7-2024, 21:32:00 GMT

The paper proposes TSDE, a posterior sampling algorithm for RL in the average reward infinite horizon setting. This algorithm uses dynamic episodes but unlike Lazy-PSRL avoids technical issues by not only terminating an episode when an observation count doubled but also terminating episodes when they become too long. This ensures that the episode length cannot grow faster than linear and ultimately a Bayesian regret bound of O(HS(AT) .5) is shown. Posterior sampling methods typically outperform UCB-type algorithms and therefore a posterior sampling algorithm for non-episodic RL with rigorous regret bounds is desirable. This paper proposes such an algorithm, which is of high interest.

algorithm, learning unknown markov decision process, thompson sampling approach, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)

Add feedback

Learning Unknown Markov Decision Processes: A Thompson Sampling Approach

Ouyang, Yi, Gagrani, Mukul, Nayyar, Ashutosh, Jain, Rahul

Neural Information Processing SystemsFeb-14-2020, 07:43:36 GMT

artificial intelligence, machine learning, reinforcement learning, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.64)

Add feedback